Huamin
QU, The Hong Kong University of Science
and Technology, huamin@cse.ust.hk
Student Team: YES
Approximately how many hours were spent
working on this submission in total?
20 days * 4 hours/day = 80 hours
May we post your submission in the
Visual Analytics Benchmark Repository after VAST Challenge 2017 is complete? YES
Video
Questions
MC2.1 – Characterize the sensors’ performance and
operation. Are they all working properly
at all times? Can you detect any unexpected
behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.
The unexpected behaviors of
sensors are organized into patterns and listed below:
Pattern 1:
Data missing
At
zero clock at some dates, majority or all the readings of all sensors were
missing regardless of kinds of chemicals. Specifically, they were:
2016.04.02
00:00 All missing.
2016.04.06
00:00 All missing.
2016.08.02
00:00 Only sensor 3’s
reading of Methylosmolene and AGOC-3A were left.
2016.08.04 00:00 All missing.
2016.08.07 00:00 All missing.
2016.12.02 00:00 All missing.
2016.12.07 00:00 Only the readings about AGOC-3A of sensor 6,
sensor 7 and senor8, readings about Appluimonia of sensor 7 and reading about Methylosmolene
of sensor 8 were left.
All missing occurred in the start of each month.
Pattern 2:
volatile organic solvents’ conflicts
At
certain times, one sensor had 2 readings of AGOC-3A and the reading of Methylosmolene
of the same sensor at the same time would be missing. The bigger one of the
repeated readings of AGOC-3A was approximately 10 times of the smaller one.
We
analyzed the occurrence pattern of this behavior, and no obvious pattern of time
was found (except that this happened more at daytime, and never occured from
22:00 to 5:00 of next day.)
But
when we combined these records with the wind direction at the time, we found
that this behavior happened mostly when the sensor received the emission from
certain company (Kasios Office Furniture). The following figure shows the
number of occurrences in each direction of all sensors, with each circle
representing the corresponding sensor, and the sensors and companies are at
their relative positions.
Figure: Orientation Distribution of Pattern 2 Error
Because
AGOC-3A and Methylosmolene are both kinds of volatile organic solvents and even
AGOC-3A is a substitution of Methylosmolene, we suspect that these two
chemicals may have reaction which may influence the sensors to have abnormal
reading, or the sensors has no enough ability to distinguish one from another
and thus leave unusual readings when one or both of them are at high level.
In
conclusion, these records are not purely errors or occurs randomly. Instead,
they reflect the (high) reading of AGOC-3A at the time at some degree. So, to
handle these readings for the following analysis, the smaller reading of
AOGC-3A are kept and the larger one is ignored because its reading will
dominate to much of the analysis after wards.
Pattern 3: Sudden change of Methylosmolene release
Compared with other chemicals, the reading of
Methylosmolene of all sensors experienced shaper changes, and even some sudden
single-point jumps.
The following figure shows the histogram of first derivative of different chemicals of all sensors.
Figure: Distribution of Derivative of all sensors of
all chemicals.
This shows that readings of all sensors
have more huge jumps on chemical Methylosmolene, which infers that all sensors
are apparently more sensitive to Methylosmolene. To find the pattern of such
sudden changes, we draw the similar plot as that in Pattern 2 (draw the times
of sudden changes according to the wind direction at that time):
Figure: Orientation Distribution of Pattern
3 Error
From the figure, we can see that, just like
the Pattern 2, these sudden changes did not occur randomly. Although the
readings may not be as normal as others, they still reflected the high level of
the release of the Methylosmolene from Company of Kasios or Roadruuner.
We suspect that sensors’ reading of the
Methylosmolene were not linear. Because lack of further information, we decide
to keep these readings unchanged for the following analysis.
Pattern 4: Linearly-increasing reading of sensor 4
The minimum reading of each sensor of each
chemical of each month were very close to zero, except those of sensor 4.
Sensor 4 had increasing minimum reading of all chemical at each month.
After further analysis, we found that
sensor had a linearly-increasing offset in its readings. As the figure below:
Figure: Line Chart of Sensor 4 of all chemicals versus
time.
According to the figure, we suspected that
the sensor 4 has a linearly-increased offset in its reading to all kinds of
chemicals. And we tried use linear regression to discover the fitting line of
readings along time for each chemical. For example, the fitting line of
Applumonia is shown below:
Figure: Line Chart of Sensor 4 of Applumonia versus
time with fitting line.
Due to the lack of the relevant
information, we take the assumption that at the beginning of the data (Apr.
1st), the offset was zero and just started to increase.
To handle this error, we let
the interception of the fitting line to be 0 and minus it from the original
reading. Then we can get a steady reading behavior of sensor 4.
Pattern 5: Larger Reading of Sensor 3
Figure: Reading Overview for Sensors
As we can see from the
figure, the reading of sensor 3 is the biggest among all the sensors and for
all chemicals. Considering its position (not so close to the companies as that
of the sensor 6), it should not have so large sums of readings. So, we think
that the reading scale of sensor 3 is problematic. Because this effect do not
affect the chemical source discovery in the following question, so the data is
left unchanged.
Pattern 6: Large Reading of Sensor 7 on a Certain Wind
Direction
Figure: Polar Plot of Sensor 7 for Different Chemical
As
we can see from the figure, for different chemicals, sensor 7 all present a
large average reading when the wind come from a specific direction. In the
further analysis, this pattern result from several abnormal data in some dates
in December 2016.
Figure: Polar Plot of Sensor 7 for Different Chemical
As
we can see in the figure, there was a single huge reading point at 04:00 on
2016-12-05 for all kinds of chemical, which can be one of the readings that
caused this pattern. The underlying reason for this is unknown.
The
data is left unchanged, but in the following questions, the polar bar plot of
sensor 7 will be partially ignored.
MC2.2 – Now turn your attention to the chemicals
themselves. Which chemicals are being
detected by the sensor group? What
patterns of chemical releases do you see, as being reported in the data?
Limit your response to no more than 6 images and 500 words.
Figure: Pie Charts of
4 Chemicals for Different Sensors
This graph is used to visualize the percentage of summed values detected by each sensor for chemicals. From the graph, it is easy to find that in the pie chart of each chemical, there exist all kinds of color, which means all sensors all detect different amount of these 4 chemicals.
Through analyzing the data, some patterns are observed and listed below:
Pattern 1: Growth Trend
Figure: Line Chart of
4 Chemicals’ Release Trend
This graph displays the total amount of each chemicals released in these 3 months.
From this graph, it is obvious that the discharge amount of each chemical all has the ascend tendency. For chemical Appluimonia and Chlorodine, there is almost a linear growth from April to December. The amount of Appluimonia and Chlorodine released are always less than those of AGOC-3A and Methylosolene. The amount of AGOC-3A ranked first all the time, followed by the amount of released Methylosolene. The release of AGOC-3A experience a relatively dramatic rise from April to August while the speed of growth reduces a lot from the August to December. On the contrast, the amount of Methylosolene rises slowly from April to August but becomes faster from August to December.
Pattern 2: Dramatic Fluctuation of volatile organic solvents
Figure: Line Charts of
4 Chemicals’ Daily Release
This graph illustrates the detailed daily release of 4 chemicals. The release of Appluimonia and Chlorodine stay stable in each day. The daily sums of these two chemicals are not high. Compared with these two chemicals, the daily release of VOC related chemicals changes more dramatically. The release of Methylosolene often starts to vary after the middle. In August and December, the readings about Methylosolene experienced a sudden rise and quickly fell to usual value. Contrast to it, the huge fluctuation of its release is usually observed in the beginning and middle of that month and its release become calm in the end of month. This may be attributed with the fact that AGOC-3A and Methylosolene are substitutions of each other.
Pattern 3: time
pattern of Methylosolene’s release
Figure: Heatmap of
Methylosolene’s all time release
This graph is the punch card about the release of Methylosolene. Using 5 as threshold, we calculate the number of reading over the threshold. The larger the size of each point is, the more readings over threshold are recorded. It is easier to find that this kind of chemicals was usually released more in the evening than that in the daytime. From the 6 am to 8pm, this chemical is almost not released while the counts of readings over threshold were usually large in 22 pm. However, this kind of pattern is hard to be found for other chemicals.
MC2.3 – Which
factories are responsible for which chemical releases? Carefully describe how
you determined this using all the data you have available. For the factories
you identified, describe any observed patterns of operation revealed in the
data.
Limit your response to no more than 8 images and 1000 words.
All of the following analysis
is based on the error-handled data mentioned in Question 1.
The readings of each sensor
are affected by wind. So to determine the contribution of each factories, the
wind data must be considered also. To better visualize the sensors’ data after
corrected by wind information, we developed the below system. Each circle is in
the relative position of corresponding sensor and is a polar coordinator. The
height of each bar in the polar coordinator stands for the averaged reading in
such wind direction. The color of bar is also related with the averaged value:
bar with red color has larger value while bar with green stands for smaller
value. Using this system, it is easy to distinguish the direction with highest
reading of each sensor. If the highest bars of majority of sensors point to one
company, it is highly possible this company is the main source of this kind of
chemical.
Appluimonia:
The following figure shows
the average reading of the different direction of each sensor of all given
records.
Figure: Orientation Distribution of Appluimonia of all time.
This graph
shows the sensor readings about the Appluimonia. From the figure, we can
clearly see highest bar of most sensors point to the company called Indigo Sol
Boards. Especially, one bar with red color in sensor 6, the sensor located in
the center of companies’ region, directs obviously to this company, which mean
most of large reading came from this direction.
In conclusion, Indigo Sol Boards is the most possible
source of the Appluimonia.
Chlorodinine:
Figure:
Orientation Distribution of Chlorodinine
of all time.
This graph shows the sensor readings about
the Chlorodinine. From the figure, it is easily found that the directions which
large readings came from of sensor1, sensor3, senor 4, senso6, snesor8
intersect at the location of Roadrunner Fitness Electronics. This means
Chlorodinine were mainly released by the company of Roadrunner Fitness
Electronics.
Methylosmolene:
Figure: Orientation Distribution of Methylosmolene of all time.
This graph shows the sensor readings about the
Methylosmolene. In this figure, the intersection of high bars of all sensors is
located in the place of Kasioc Office Furniture. What’s more, according to the
analysis in the Question 1, the change of the reading of Methylosmolene is
larger and more sudden than others, and in this figure, it is reflected by a
few very huge bars overwhelming others. Although there is possibility that the
sensor are more sensitive to this chemical, the high level of emission of
Methylosmolene is still reflected.
So clearly, these large
readings all came from Kasios Office Furniture.
AGOC-3A:
Before the error handling method applying to the dataset, originally, the reading distribution is:
Figure:
Orientation Distribution of AGOC-3A of all time before error elimination.
But after handling the errors mentioned in Question 1 Pattern 2:
Figure: Orientation Distribution of AGOC-3A of all time after error elimination.
It is very obvious that the suspicious contribution from the company of Kasios Office Furniture is eliminated by the error handling method mentioned in Question 1 Pattern 2 (when the repeated reading occurred, eliminate the larger one, and keep the smaller one). And only the company of Radiance ColourTek is pointed to in the second figure.
But It is very notable that only the release from Kasios Office Furniture would cause such error. Considering the release of Methylosmolene by Kasios and the possible reaction between these two chemicals mentioned in the answer of Question 1 Pattern 2, we suspect that it is just the high-level release of both AGOC-3A and Methylosmolene from Kasios that led to such contribution elimination. So, we still consider Kasios Office Furniture as major source of AGOC-3A and Radiance ColourTekas the secondary source.
Conclusion:
The contribution of each company to each chemicals is shown in the following table:
|
Appluimonia |
Chlorodinine |
AGOC-3A |
Methylosmolene |
Indigo Sol Boards |
1 |
|
|
|
Roadrunner Fitness Electronics |
|
1 |
|
|
Radiance ColourTek |
|
|
1 |
|
Kasios Office Furniture |
|
|
1 |
1 |